Skip to content

Conversation

@ngholiza
Copy link
Collaborator

No description provided.

ngholiza and others added 23 commits June 15, 2026 13:56
Generalize the Upload section beyond the canonical etcher:

- Add generic equipment_runs table (JSONB inputs/outputs) with RLS mirroring
  etcher_runs, plus runtime ensure migration and SQL scripts.
- Route Process: etcher -> sync_runs_pg (unchanged); any other equipment ->
  sync_equipment_runs_pg with a generic CSV->rows builder driven by the
  equipment config.
- Surface generic runs in /runs list and detail via a disjoint run_id offset
  so the existing Data Catalog renders them with no frontend rewrite.
- Template generator now includes registered outputs/parameters, not just
  features/targets.
- Fix the previously dead min/max validation (read features id/min, parameters
  min_value, targets.constraints) and make out-of-range a non-blocking warning.
- Upload page wording is equipment-generic and shows range warnings on success.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Skip the template value-type descriptor row during ingestion (both etcher
  and generic builders) so a filled-in template processes cleanly.
- Attribute generic runs to their own equipment_id and resolve the display
  name via equipment_metadata instead of the project's equipment.
- Surface generic measured outputs in the catalog run-detail page and render
  generic input set-points (strings or numbers); add outputs to V2Run.
- Count generic equipment_runs in equipment inventory run metrics.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Explicit-column templates now also include lot/timestamp keys and registered
  input parameters so no run metadata is silently dropped.
- process_upload rejects linking a run to a project whose equipment differs
  from the upload's equipment (empty project equipment stays unconstrained).
- run_to_jsonld serializes generic measured outputs and labels the dataset by
  its actual equipment instead of always "Etcher".

Co-authored-by: Cursor <cursoragent@cursor.com>
- _coerce_optional_float rejects NaN/Infinity so generic ingestion stores the
  raw token as text instead of emitting invalid JSONB and 500-ing.
- _fetch_equipment_run_metrics aggregates on the superuser connection so
  inventory run counts/last-data are complete fleet-wide rather than being
  hidden by RLS for private/shared projects.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Scope the catalog Download button explicitly to the etcher ML export
  (label + tooltip + filename) since that endpoint is the model-specific
  training matrix; generic runs remain browsable in the catalog.
- Validate generic run timestamps and required lot/timestamp cells during
  processing so malformed dates surface as actionable validation errors
  instead of a database-sync 500; store normalized ISO timestamps.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Required upload columns are now the equipment's declared inputs (explicit
  columns, else features+parameters); outputs/targets and synthetic
  placeholder targets (primary_metric/secondary_metric) are never required,
  so valid CSVs for newly registered equipment are no longer rejected.
- Generic ingestion coerces each cell by its registered type, preserving
  zero-padded string ids, integers, and booleans instead of floating all.
- run_to_jsonld / _load_domain_config fall back to the equipment's PG
  config_json so FAIR metadata for PG-registered equipment carries units,
  ontology, hardware, and owner info.

Co-authored-by: Cursor <cursoragent@cursor.com>
- equipment_runs gains a stable (upload_id, row_index) identity; reprocessing
  an upload now upserts in place (preserving each run's id/run_id and catalog
  URL) and prunes only rows dropped by a shorter re-upload.
- _load_domain_config returns the file config only when present so the PG
  config_json fallback actually runs for database-only equipment.
- _get_feature_ontology_map now indexes registered outputs so their units and
  QUDT metadata appear in FAIR JSON-LD.
- Run-detail page renders boolean/string generic measurements explicitly
  instead of rendering nothing.

Co-authored-by: Cursor <cursoragent@cursor.com>
- run_to_jsonld selects the etcher target fallback by equipment_id, so a
  non-etcher run with no outputs no longer emits null AvgEtchRate/RangeEtchRate
  properties and is correctly labeled.
- Template descriptor-row detection now requires the complete hint row and is
  only applied to the first data row, so a sparse data row that happens to equal
  a type token is never silently dropped.

Co-authored-by: Cursor <cursoragent@cursor.com>
_required_upload_columns now always requires the lot and timestamp metadata
columns plus declared inputs (features + parameters), and excludes outputs even
when a legacy config lists them as explicit columns. This stops anonymous/
undated catalog runs for new equipment and stops wrongly requiring output
columns (or omitting appended parameters) for explicit-column configs.

Co-authored-by: Cursor <cursoragent@cursor.com>
- upload_file validates registered equipment against _required_upload_columns
  always (even with explicit columns), so an inputs-only CSV is processable.
- Generic ingestion reports an error for non-empty cells that cannot match
  their declared numeric/boolean type instead of silently storing bad data.
- Template descriptor-row detection only compares hint columns present in the
  uploaded row, so a trimmed inputs-only template is still recognized.
- Range warnings recorded at validation time are carried through processing
  (errors_json) and returned, keeping the processed-upload warning UI working.

Co-authored-by: Cursor <cursoragent@cursor.com>
- upload_file validates canonical-etcher CSVs against the full alias-aware
  REQUIRED_SYNC_COLUMNS (including outputs) so an upload accepted at validation
  time can actually be processed by _csv_upload_to_sync_rows.
- process_upload always returns errors as an array (empty when warning-free),
  restoring the API contract the upload page relies on (result.errors.length).

Co-authored-by: Cursor <cursoragent@cursor.com>
_declared_type_violation now flags non-integral values (e.g. 1.5) for columns
declared int/integer/long, so fractional data can't be stored in fields the
equipment schema marks as integers.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Template value-type hints now honor declared parameter/output types
  (string/boolean/int), so a non-float field is no longer advertised as float.
- etcher_runs always report equipment_id/name as the canonical etcher (even
  when the project has no equipment association), so FAIR JSON-LD keeps etch-rate
  results and etcher labeling instead of degrading to a generic run.

Co-authored-by: Cursor <cursoragent@cursor.com>
…utput fixes

- etcher_runs are unconditionally identified/aggregated as the canonical etcher
  (data_loader_pg base select and metadata_pg metrics), so a project associated
  with another tool (or none) can't misattribute or drop canonical runs.
- Template hints prefer a non-float declared type over a unit, so boolean/int/
  string fields advertise their real type while numeric fields keep unit hints.
- Generic ingestion omits None-valued input/output cells, so inputs-only or
  partially-filled uploads don't store misleading {"result": null} measurements.

Co-authored-by: Cursor <cursoragent@cursor.com>
get_summary_stats_pg now aggregates etcher_runs together with equipment_runs
for total/clean/outlier counts and the date range, so processing a non-etcher
upload updates the dashboard's run statistics instead of leaving them unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>
_get_feature_ontology_map now indexes registered input parameters (unit, QUDT,
prov_direction: input), so generic equipment that declare inputs under
parameters rather than features keep their units and provenance in FAIR JSON-LD.

Co-authored-by: Cursor <cursoragent@cursor.com>
…etadata

PG-registered equipment store manufacturer/model/serial_number/location at the
config top level, but run_to_jsonld reads them from config["domain"].
_load_domain_config now normalizes the PG config so these hardware/identity
fields are backfilled into domain, keeping them in generic-run FAIR JSON-LD.

Co-authored-by: Cursor <cursoragent@cursor.com>
…jection

- Merged etcher/equipment run pages sort on the actual instant (offset-aware
  datetimes normalized to UTC) instead of raw ISO strings, so runs with
  different UTC offsets paginate in true chronological order.
- Generic ingestion rejects structurally malformed rows (extra cells under the
  DictReader None key, or short rows leaving columns at the None restval) so a
  directly-invoked process call can't turn a malformed CSV into a catalog run.

Co-authored-by: Cursor <cursoragent@cursor.com>
…round-trip

Add _coerce_optional_int and use it for int/integer/long columns in both
_coerce_typed and _declared_type_violation, so 64-bit identifiers beyond
2**53 keep full precision instead of being silently rounded via float.

Co-authored-by: Cursor <cursoragent@cursor.com>
…nfigs

- _coerce_optional_int parses decimal-form integers (e.g. 9007199254740993.0)
  via Decimal instead of binary float, preserving 64-bit values; non-finite
  Decimals are rejected.
- _load_domain_config normalizes file-backed/registered JSON configs too, so
  manufacturer/model/serial/location stored at the config top level appear in
  FAIR metadata for equipment that have a JSON snapshot, not just PG-only ones.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ions

Add _effective_equipment_config to backfill a registered equipment's explicit
column schema from its top-level columns (columns_json) when config_json is
empty/incomplete. upload_file and _load_equipment_config both use it, so the
upload schema validation and generic processing no longer accept or drop files
that omit registered columns. Restores test_upload_reports_missing_required_columns.

Co-authored-by: Cursor <cursoragent@cursor.com>
Merged run pagination now returns an empty page for limit=0 (matching the prior
SQL LIMIT 0 semantics) instead of treating a falsy limit as unlimited-from-
offset; limit=None remains unlimited and limit>0 returns the window.

Co-authored-by: Cursor <cursoragent@cursor.com>
Generic ingestion now parses is_outlier, is_calibration_recipe, and outlier_type
(via the standard column aliases) into the top-level row passed to
sync_equipment_runs_pg, so uploaded generic runs keep their quality metadata
instead of defaulting to false/empty and being misclassified in catalog filters,
badges, and summary outlier counts.

Co-authored-by: Cursor <cursoragent@cursor.com>
@ngholiza ngholiza merged commit 81ac864 into dev Jun 16, 2026
Sign in to join this conversation on GitHub.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant